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Self-organization provides a suitable paradigm for developing self-managed complex computing systems, 
e.g., decision support systems. Towards this end, in this paper, a composite self-organization mechanism in 
an agent network is proposed. To intuitively elucidate this mechanism, a task allocation environment is sim- 
ulated. Based on self-organization principles, this mechanism enables agents to dynamically adapt relations 
with other agents, i.e., to change the underlying network structure, so as to achieve efficient task allocation. 
The proposed mechanism utilizes a trust model to assist agents in reasoning with whom to adapt relations 
and employs a multi-agent Q-learning algorithm for agents to learn how to adapt relations. Moreover, in 
this mechanism, it is considered that the agents are connected by weighted relations, instead of crisp rela- 
tions. The proposed mechanism is evaluated through a comparison with a centralized mechanism and the 
K-Adapt mechanism in both closed and open agent networks. Experimental results demonstrate the adequate 
performance of the proposed mechanism in terms of the entire network profit and time consumption. Addi- 
tionally, a potential application scenario of this mechanism is also given, which exhibits the potential appli- 
cability of this mechanism in some real world cases. 

© 2012 Elsevier B.V. All rights reserved. 



1. Introduction 

Nowadays, more and more large and complex computing systems 
are emerging, e.g., decision support systems. These systems have high 
the desirability to be autonomic, which are capable of self- 
management, because self-management systems can save labor 
time of human managers and are able to adapt to environmental 
changes and ensure their own survivability [26]. Within this context, 
self-organization, which is defined as "the mechanism or the process 
enabling the system to change its organization without explicit external 
command during its execution time [34]”, can be used in such systems 
to achieve autonomy [29,31,37,47]. For example, Smirnov et al. [37] 
integrated several technologies, e.g., ontology management, context 
management, constraint satisfaction, etc., to implement a self- 
organizing decision support system as a set of Web-services. On the 
other hand, De Wolf and Holvoet [7] recommended that agent- 
based modeling is best suited to build such autonomic systems. 
Thus, we consider that self-organizing multi-agent systems are good 
choices for developing such autonomic systems, as the self- 
organizing systems can continuously arrange and rearrange their or- 
ganizational structures autonomously, without any external control, 
so as to adapt to environmental changes. In addition, the adaptation 
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process should be performed in a decentralized manner, so that the 
autonomic systems could be robust against failures of any compo- 
nents in the system. With this motivation, this paper explores the 
area of self-organization in systems of autonomous agents, and par- 
ticularly, we focus on structural adaptation in agent organized net- 
works. Based on Kota et al.'s description [28], any self-organizing 
systems should have the following three properties: 

1. No external control. All of the adaptation processes should be ini- 
tiated internally and change only the internal state of the system. 

2. Dynamic and continuous operation. The system is expected to evolve 
with time and thus, the self-organization process should be 
continuous. 

3. No central control. The self-organization process should be operat- 
ed through only local interactions of individual components in the 
system without centralized guidance. 

Here, we are primarily interested in cooperative agent organized 
networks, which comprise cooperative autonomous problem- 
solving agents that receive inputs, perform tasks and return results, 
because such agent organized networks can clearly model the essence 
of complex computing systems. Moreover, the structure of an agent 
network is a manifestation of the relations between the agents, 
which in turn, determine their interactions. Therefore, structure ad- 
aptation signifies changing relations between agents, which then re- 
directs their interactions. To make the problem clear, which we 
attempt to address in this paper, let us consider a simple example. 
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Fig. 1 demonstrates an agent network. Consider that agent a r re- 
ceives a task <p, which can only be executed by agent a B . Then, the 
best option for agent a i is to forward task <p to its neighbor, agent 
a 4 , which in turn forwards task <p to agent a 5 until task <p reaches 
agent a 6 . Now, if this case occurs several times, one would expect per- 
formance to improve if a ^ adds a 6 as one of its neighbors because this 
will save an unnecessary overhead. Similarly, if a 4 rarely sends any re- 
quest to its neighbor a 2 , then removing neighbor a 2 from a^s neigh- 
bors will reduce the computational overhead associated with taking 
agent a 2 into account whenever is making a decision. 

Against this background, a composite self-organization mecha- 
nism in an agent network is proposed in this paper. Following self- 
organization principles, this mechanism is a decentralized and con- 
tinuous process, which is followed by every agent to decide with 
whom and how to adapt its relations with others, based only on 
local available information. Furthermore, this mechanism involves 
only changing the structural relations between the agents and does 
not need agents to change their internal properties, e.g., resources 
and capacities. Likewise, this mechanism does not need new agents 
to be created, e.g., an overloaded agent creating a new agent and 
assigning part of its load to the new agent, or existing agents to be re- 
moved, e.g., agents, which are idle for a long time being removed. 
Therefore, this mechanism can be used as a self-management tool in 
some autonomic systems where creating new agents and removing 
existing agents are either not permitted or difficult to realize. Accord- 
ing to this mechanism, pairs of agents are able to continuously and lo- 
cally evaluate their inter-relations based on their past interactions. 
Every pair of agents can calculate the reward of each possible relation 
between them and cooperatively choose the one which can maximize 
the sum rewards of them. In addition, this mechanism can be operat- 
ed in open agent networks as well, where new agents can join in and 
existing agents can leave. In comparison with current related works, 
which mainly focused on how to adapt crisp relations between 
agents, this mechanism originally integrates two notions into self- 
organization, i.e., trust and weighted relation. A trust model can be 
used by each agent to select the most valuable agents to adapt rela- 
tions, and a weighted relation is used to measure how strong the re- 
lation is between two agents. Furthermore, for relation adaptation, 
most current related algorithms considered only one type of relations, 
while our relation adaptation algorithm can handle multiple types of 
relations. Kota et al.'s [27] algorithm can also deal with multiple types 
of relations. The main difference between our algorithm and theirs is 
that (1) our algorithm can handle weighted relations, while Kota et 
al.’s algorithm was designed for crisp relations only; (2) our algo- 
rithm can balance the exploration and exploitation, whereas Kota et 
al.'s algorithm overlooked this balance and always chose the most 
beneficial action at each step. This may let their algorithm converge 
to a suboptimal state, as stated by Kaelbling et al. [21]. 

In the next section, we first review related research in the self- 
organization area and through comparison, we point out the merits 
of our self-organization mechanism. Then, we develop a cooperative 
agent network model, which serves as an abstract platform, in 
Section 3, and based on this platform, our composite self- 
organization mechanism is devised in Section 4. By employing a gen- 
eral platform, i.e., task allocation in an agent network, instead of a 




particular existing system, we can develop a general mechanism 
that can be potentially applied to a wide variety of applications. 
After describing the self-organization mechanism, in Section 5, exper- 
imental results and analysis of the mechanism are presented. To dem- 
onstrate the potential applicability of the mechanism, in Section 6, we 
provide a particular scenario and exhibit how to exploit the mecha- 
nism into this scenario. Finally, this paper is concluded in Section 7. 

2. Related work 

Self-organization can be generated in multi-agent systems in sev- 
eral ways [3,33], Serugendo et al. [33] divided self-organization 
mechanisms into five classes. 

1. Direct interactions between agents using basic principles such as 
broadcast and localization. Typical examples of such mechanisms 
are those applied in the areas of self-assembly and distributed 
self-localization, where the formation of regular spatial patterns 
in mobile objects is required. For example, Mamei et al. [30] used 
a leader election algorithm to determine the center of gravity of 
the objects and propagate this information to all objects which 
keep moving until a specific distance from the center is reached. 

2. Indirect interactions between agents, i.e., stigmergy, which means 
indirect coordination through traces left in the environment. Such 
mechanisms, based on the idea of stigmergy, have been applied to 
manufacturing control [24], supply network management [32], 
and so on. 

3. Reinforcement of agent behaviors. These reinforcement self- 
organization mechanisms are based on the capabilities of the 
agents to dynamically modify their behavior according to some 
perceived states of the environment. For instance, Weyns et al. 
[6] presented a model that focuses on dynamically adapting logical 
relations between different behaviors, represented by roles. An 
agent, starting from its current state, can successively follow this 
model to complete its tasks. 

4. Cooperation behavior of individual agents. A typical example of 
these mechanisms is organization self-design, which uses the 
primitives of agents’ composition and decomposition, e.g., Kamboj 
and Decker [22] developed an organizational self-design mecha- 
nism in semi-dynamic environments, which enables an agent to 
be divided into two agents to respond to overwhelming environ- 
mental demands and enables two agents to merge into one agent 
if communication overhead between the two agents is too high. 

5. Generic architectures or meta-models. The self-organization 
mechanisms, based on generic architectures or meta models of 
the agents organization, are instantiated and subsequently dynam- 
ically modified as needed according to the requirements of a par- 
ticular application. A typical example is PROSA [4], which is 
based on a holonic hierarchy model. Self-organization refers to al- 
tering the holonic hierarchy following perturbations of the agent 
environment by using a known decision making technique. 

However, most of the self-organization mechanisms, summarized 
in [33], are not applicable to an explicitly modeled agent network, as 
they are based on reactive agents interacting in unstructured ways. 
Some of the few mechanisms that do consider underlying agent net- 
work structures are centralized in nature, requiring some specialized 
agents to manage the adaptation process. In 2001, Horling et al. [17] 
proposed an organization structure adaptation method, which 
employed a central blackboard, by using self-diagnosis. Their method 
involves a diagnostic subsystem for detecting faults in the organiza- 
tion. Then, against these faults, some fixed pre-designed reorganiza- 
tion steps are launched. Hubner et al. [18] presented a controlled 
reorganization mechanism that is a top-down approach, in which a 
specialized group of agents perform the reorganization process for 
the whole organization. Bou et al. [5] developed a centralized reorga- 
nization mechanism, in which the central authority named 
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“autonomic electronic institution” modifies the norms of the institu- 
tion to achieve institutional goals. Hoogendoorn [16] presented an 
approach based on max flow networks to dynamically adapt organi- 
zational models to environmental fluctuation. His approach assumed 
two special nodes in the network, namely the source node with an 
indegree of 0 and the sink node with an outdegree of 0. These two spe- 
cial nodes make the approach centralized in nature, since if these two 
nodes are out of order, the organization adaptation process might be 
impacted. Hermoso et al. [15] developed a coordination artifact in an 
open multi-agent system, which is used to build and evolve a role tax- 
onomy model over time. Their role taxonomy model, acting as an in- 
formation center, is available to all the agents and helps them find 
valuable partners to improve their individual utilities. Hence, the ap- 
proaches proposed in [5,15-18] are centralized in nature and have 
the potential of the single point of failure. 

Self-organization mechanisms focusing on network structural adap- 
tation (namely adapting relations among agents), which are used to im- 
prove team formation or task allocation, have also been researched 
[l,9,10,12,27].Gaston and desJardins [9] developed two network struc- 
tural adaptation strategies for dynamic team formation. Their first strat- 
egy was a structure-based approach, where an agent prefers to select 
another agent to form a connection, which has more neighbors. Their 
second strategy was a performance-based approach, where an agent 
prefers to form a connection with the agent who has better perfor- 
mance. The two strategies are suitable in different situations. Clinton 
et al. [10] empirically analyzed the drawback of the structure-based 
strategy proposed in [9], and then designed a new network adaptation 
strategy, which limits the maximum number of links an agent can 
have. Abdallah and Lesser [1] did further research in self-organization 
of agent networks and creatively used reinforcement learning to adapt 
the network structure. Their method enables agents to not only adapt 
the underlying network structure during the learning process but also 
use information from learning to guide the adaptation process. Griffiths 
and Luck [12] presented a tag-based mechanism for supporting cooper- 
ation in the presence of cheaters by enabling individual agents to change 
their neighborhoods. Griffiths and Luck's mechanism is very suitable in 
some special dynamic environments where trust or reputation among 
agents is difficult to build up. However, these network structural adapta- 
tion methods, proposed in [1,9,10,12], assumed that only one type of re- 
lation exists in the network and the number of neighbors possessed by 
an agent has no effect on its local load. These assumptions are impracti- 
cal in some cases where multiple relations exist among agents in a net- 
work and agents have to expend resources to manage their relations 
with other agents. To overcome this disadvantage, Kota et al. [27] de- 
vised a network structural adaptation mechanism, which took multiple 
relations and relation management load into account. The relation adap- 
tation algorithm adopted in their mechanism lets agents take actions, 
which can maximize the utility at each step. Nevertheless, as stated by 
Kaelbling et al. [21], this kind of algorithms, which always takes the 
highest utility action, overlooks the tradeoff between exploitation and 
exploration and may finally converge to a sub-optimal state. Further- 
more, Kota et al.'s mechanism is somewhat biased from the agent, 
which initializes the relation adaptation process, towards the agent, 
which is requested to adapt a relation, as the initializing agent evaluates 
most attributes of the reward of changing a relation. This bias might 
cause agents to be a little subjective when making decisions. 

Besides the disadvantages mentioned above, the common limitation 
of current related research is that candidate selection for self- 
organization, i.e., relation adaptation, is simplified. Intuitively, before 
adapting relations, agents should first decide with whom to adapt a re- 
lation and this course is known as candidate selection. For candidate se- 
lection, agents in current related work use only their own experience. 

In addition, current self-organization mechanisms consider only 
crisp relations between agents, which might be another limitation. 
Here, crisp relation means that between two agents there is either a 
relation or no relation. To overcome this limitation, weighted relation 



is used in this paper, which means that between two agents, there is a 
relation strength, ranged in [0, 1 ], to indicate how strong the relation 
is between two agents. The introduction of weighted relation is rea- 
sonable, because, in the real world, the relation change between 
two persons usually occurs gradually rather than suddenly. Thus, 
weighted relations should be more flexible and more suitable in 
agent networks than crisp relations. 

Against this background, in this paper, we first model a task- 
solving agent network and then propose a composite self- 
organization mechanism. This self-organization mechanism consists 
of three elements, which are claimed as a three-fold contribution of 
this paper. The first one is that, for candidate selection, we integrate 
a trust model to enable agents to use not only their own experience 
but also other agents' opinions to select candidates, which can make 
agents select the most valuable candidates to adapt relations. The sec- 
ond one is that, for adapting multiple relations, we develop a multi- 
agent Q-learning algorithm. This algorithm enables two agents to in- 
dependently evaluate their rewards about adapting relations and bal- 
ances exploitation and exploration. Consequently, our mechanism 
could overcome the aforementioned flaws of Kota et al.'s mechanism 
[27]. The last one is that, in contrast with current related approaches, 
which consider only crisp relations, we introduce weighted relations 
into our self-organization mechanism. The introduction of weighted 
relations can improve the performance of our self-organization mech- 
anism (shown in Section 5) and make the mechanism more suitable 
in dynamic environments. 

3. The agent network model 

There are several existing frameworks for modeling agent organiza- 
tions or networks, such as [42] and [35], The focus of these frameworks 
is on how to automatically build an agent organization or an agent net- 
work based on users' requirements. These frameworks [35,42] did not 
consider self-organization of the established agent organizations or 
agent networks. In this paper, our focus is on self-organization, which 
requires that agent networks can autonomously and dynamically 
adapt their structures at run-time when the environment changes. 
Hence, these frameworks cannot be utilized in this paper. In this section, 
an agent network model is proposed which concentrates on how to 
self-adapt an existing agent network structure to achieve efficient task 
allocation. The agent network model describes how the network is or- 
ganized and the features of the network. Those contents regarding 
how to operate the network will be depicted in the next section, i.e., 
how to choose candidates for relation adaptation and how to adapt re- 
lations with these candidates. 

The agent network model proposed in this paper is based on the 
model devised by Kota et al. [27], i.e., defining multiple relations in an 
agent network. Our agent network model, on the other hand, extends 
Kota et al.'s model in two aspects. The first aspect is that instead of 
crisp relations used by Kota et al., we define weighted relations, which 
connect agents in the network. The second aspect is that we employ 
and extend Semi-Markov Decision Process (SMDP) to model the multi- 
agent environment, while Kota et al.'s model did not include such a 
mathematical framework. We use SMDP to model the agent network, 
because SMDP is useful for studying various optimization problems, 
which can be solved by reinforcement learning [39].Thus, we can then 
conveniently develop a multi-agent Q-learning algorithm in this model. 

3.1. Weighted relations of agents 

In our model, an agent network comprises a set of collaborative 
agents, i.e., A = [oi a n }, situated in a distributed task allocation en- 

vironment. In the agent network, instead of crisp relations, two 
weighted relations are defined, which are peer-to-peer relation and 
subordinate-superior relation, respectively. The formal definitions of 
these two relations and relation strength are given as follows. 
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Definition 1. peer-to-peer 

A peer-to-peer relation, denoted as “~/i" (~f* A x A), is a Compati- 
ble Relation, which is reflexive and symmetric, such that VOfeA : 
a, ~^i a,- and Vai,a,GA : a,- aj=*aj~^ a,-, where = p,,. 

Definition 2. subordinate-superior 

A subordinate-superior relation, written as “xF” (xF A x A), is a 
Strict Partial Ordering Relation, which is irreflexive, asymmetric and 
transitive, such that Va f GA -d‘» a,), Va,-. a^GA : a, x^s xA‘ a,) 

and Vdi, Qj, a/(GA : a, xAj a^Aa,- xA< a k =*ai<a k . 

Definition 3. relation strength 

Relation strength, denoted as pj, indicates how strong the relation is 
between agents a, and a,-, pij is ranged in [0, 1], where a higher value 
means a stronger relation and 0 demonstrates no relation between two 
agents. 

Relation strength affects the task allocation process, as agents usu- 
ally prefer to allocate tasks to those agents which have high relation 
strength with them. Initially, the relation strength between any two 
neighboring agents is set to 0.5 by default, but it might be adapted 
during the succeeding task allocation process. 

According to the above definitions, there are three neighborhood 
types, which are peer, subordinate and superior neighbors. Here is an 
example to explain the relations defined above. If agent a,- is a peer 
of a,-, written as a f a,-, a,- is also a peer of a„ written as a 3 ~Ai a it 
where p,j = p,,. If agent a, is a subordinate of aj, i.e., a, is a superior of 
a^ written as a,- aj, and a,- is a subordinate of a k , so that a,- is a subor- 
dinate of a k as well. Since a, is not a direct linked neighbor of a k , there 
is no relation strength between them. For convenience, we stipulate 
that relation > is the reverse relation of x, namely 
a,- xF» dj<=*dj -<A' a,, where py =p,,. As the agent network model oper- 
ates in a cooperative environment, it is assumed that the relation is 
mutual between the concerned agents and both the concerned agents 
respect the relation type and relation strength between them. To 
clearly understand the meanings of relations and relation strength, 
here is a real-world example. In the human real-world, the peer-to- 
peer relation can be seen as the friendship relation, while the subordi- 
nate-superior relation can be considered the employee-employer re- 
lation. The relation strength indicates how strong the relation is 
between two persons. For instance, if two persons are friends, they 
could be general friends, good friends or very good friends. This dif- 
ference can be demonstrated by using relation strength. Thus, our 
agent network model could be applied to model a social network. Al- 
though in a social network, more than two types of relations may 
exist, our model can be easily extended by defining more relations. 



3.2. SMDP model for the agent network 

The standard Semi-Markov Decision Process (SMDP) [40] provides a 
basic approach for modeling an agent's decision if the agent's actions 
may change at different states. In this paper, however, we need to 
model an open agent network where a mutative number of agents 
exist. We, thus, extend the standard SMDP in order to model a multi- 
agent environment, so that the open agent network is modeled as a 

tuple < n(t),S,Act~! n (cyT,R\ where n(t) is the number of 

agents in the open agent network at time t, S is the state space and its 
specific value in time t is written as s(t) and s(t)GS, Actj(s(t)) is the 
set of actions available to agent a, at a given state s(t). The transition 
function T is defined as T:SxActiX...xAct n — >S, which describes the 
resulting state s( f + 1 ) g S when each agent takes one of their available 
actions at states(t). R, is the reward function for agent a„ R, : 
S x Act] x, .... xAct n ( t )—>Tli and 7 1, is the immediate reward obtained 
by agent a, when each agent takes one of their available actions at 



state s(t). The details of reward will be depicted in Section 4.2. We 
first provide formal definitions of State, Action and Action Set. 

Definition 4. State 

A specific state at time t is defined as a tuple s(t) = (Neig^(t) 

Neig n (t)), s(1)gS. The element, Neigft), is the neighbor set of agent 
a, at time t. Neigft ) can be further divided into three subsets, Neig- 
i“(t), Neigf(t) and Neigf(t). Neig,~(t) contains the peers of a,-, Neigf(t) 
consists of the direct superiors of a„ and Neigf(t) comprises the direct 
subordinates of a,. 

Let us consider an example. In Fig. 2, agent a r maintains a peer-to- 
peer relation with its two neighbors, a 2 and a 3 , so that Neig^(t) = {a!, 
02,03} (since peer-to-peer relation, ~, is reflexive, a 1 is a peer of a r it- 
self). Agent a 1 also has two direct subordinates, a 5 and a 7 , so that Neig- 
i > (t) = {a 5 ,a 7 }. Additionally, a 3 has a direct superior, a 4 , so that 
Neigf(t) = {a 4 }. Therefore, Neigft) = Neigf(t) U Neigf(t) U Neigf(t) = 
{ aj,a 2 ,a 3 ,a 4 ,a 5 ,a 7 }. The number, alongside each line, demonstrates 
the relation strength between two agents. For example, the relation 
strength between agents a ^ and a 5 is 0.8. 

Definition 5. Action 

An action is defined as a decision made by an agent to adapt a re- 
lation with another agent. 

There are seven different atomic actions defined in our model, 
which are enh_~, enh_<, enh_>, wkn_~, wkn_x, wkn_> and no_action. 
The explanation of each action and its reverse action is described in 
Table 1. It should be noted that the meanings of actions enh_ and 
wkn_ imply not only enhance and weaken a relation but also form 
and dissolve a relation, respectively. 

In addition, the combinations of atomic actions include wfcn_x + 
enh_~, wkn_< + enh_>, wkn_~ + enh_<, wkn_~ + enh_>, wkn_> + enh_~ 
and wkn_> + en/i_x. The meanings of these combination actions can be 
easily deduced from Table 1. For example, the combination, wkn_< + 
enh_~, which is taken by a, with a,, implies that a,- first dissolves a, from 
a,' s superior and forms a peer relation with a,. It should be noticed that 
an agent at different states might possess different available actions. 

The possible choices of actions available to agents in different sit- 
uations are illustrated as follows. 

1 . There is no relation between agents a, and a,. The possible choices 
of actions include enh_~, enh_<, enh_> and no_action. 

2. aj is a peer of aj, i.e., ap-a,-. The possible actions involve wkn_~, 
wkn_~ + enh_<, wkn_~ + enh_> and no_action. 

3. a,- is a subordinate of a.j, i.e., a,- x a,-. The possible actions include 
wfcn_x, wkn_< + enh_~, wkn_< + enh_> and no_action. These ac- 
tions are based on a/s perspective, while, in a/s view, aj needs to 
reverse these actions. 

4. a, is a superior of aj, i.e., a ) < a,-. This situation is just the reverse con- 
dition of a, x aj. 




Fig. 2. An example agent network at time t. 



peer-to-peer 

sub-sup 
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Table 1 

Explanation of each action. 



Action (on 
a,'s view) 


Explanation 


Reverse Action 
(on a/s view) 


djenh _<aj 


a, enhances the subordindte relation with dj, or 
a, becomes a subordindte of dj 


djenh_>dj 


djenh _~dj 


a, enhances the peer relation with a Jt or a, 
becomes a peer of dj 


djenh _~dj 


djenh _>a, 


dj enhances the superior relation with dj, or a, 
becomes a superior of dj 


djenh _<dj 


djWkn _<cij 


dj weakens the subordindte relation with dj, or 
dj dissolves a superior, dj 


djWkn _>dj 


diWkn_~dj 


dj weakens the peer relation with dj, or a,- 
dissolves a peer, dj 


djWkn _~dj 


d t wkn_>dj 


dj weakens the subordindte relation with dj, or 
dj dissolves a subordindte, dj 


djWkn _<dj 


dj no _dctiondj 


dj does not take any action with dj 


djno _dctiondj 



Definition 6. Action Set 

An action set, Acti(t), is defined as a set of available actions for 
agent a, at a given state s(f), which might include some atomic actions 
or combinations of atomic actions. 

For example, in Fig. 2, the action set of agent a 2 at time t is Act 2 (t) = 
{enh_~, enh_<, enh_>, wkn_~, no_action }. Agent a 2 cannot execute the 
actions wkn_< or wkn_> at time t, because a 2 has neither subordinates 
nor superiors. 

As described earlier in this section, the aim of the agent network is to 
assign tasks to agents such that the communication cost among agents is 
minimized and the benefit obtained by completing tasks is maximized. 
Each agent, as A, has a different set of resources, written as Res(a), and 
different computation capacity which is used for completing tasks, writ- 
ten as Comp(a). An agent possesses information about the resources it 
provides, the resources its peers could provide, and the resources all of 
its subordinates and its direct superior could provide. For example, in 
Fig. 2, the contents in parenthesis denote the resources supplied by an 
agent. Thus, the information about resources that agent a ^ possesses can 
be depicted by the following set {{res!, res 2 , res 3 }, {res lt res 2 ], {resi, res 2 , 
res 3 , res 4 }}. The first subset demonstrates agent cq's peers resources infor- 
mation, including a , itself. The information in the second subset is about 
the resources that agent a \ 's direct superior can provide. The last subset in- 
dicates the resources, which are supplied by all of cq's subordinates in 
total, even though a ^ might have no idea exactly which subordinate 
owns which resource. 

The task environment presents a continuous dynamic stream of 
tasks that have to be completed. Each task, <f>, is composed of a set of 
subtasks, i.e., <t> = {<pj , . . . , (p m ). Each subtask, <p f e cp, requires a particular 
resource and a specific amount of computation capacity to fulfill. In ad- 
dition, each subtask has a relevant benefit paid to the agent which suc- 
cessfully completes the subtask. Each task has a preset deadline as well, 
which should be satisfied. Otherwise, the benefit will be decreased 
gradually with time elapse till 0. Xu et al. [46] have shown that token- 
based mechanisms can collect as much information as broadcast mech- 
anisms while using much less bandwidth. Thus, in this paper, a subtask 
( pi is modeled as a token A, which can be passed in the network to find a 
suitable agent to complete. Each token consists of not only the informa- 
tion about resource and computation requirements of the correspond- 
ing subtask, but also the token travel path which is composed of those 
agents that the token has passed. During the allocation of a subtask (p , 
an agent a, always tries to execute the subtask by itself if it has adequate 
resources and computation capacity. Otherwise, a, will generate a token 
for the subtask and pass the token to one of its subordinates, which con- 
tains the expected resource. Since a, does not know which subordinate 
has which resource, the token might be passed several steps in the 
agent network forming a delegation chain. If a* finds no suitable subor- 
dinate (i.e., no subordinate contains the expected resource), it will tiy to 
pass the token to its peers. In the case that no peer is capable of the 



subtask, a, will pass the token back to one of its superiors, which will at- 
tempt to find some other subordinates or peers for delegation. When 
more than one agent is able to accept the token, a, always passes the 
token to the agent, which has higher relation strength with a,. Here, it 
should be emphasized that our focus is not on distributed task alloca- 
tion. Instead, it is on the underlying agent network structure adaptation 
while allocating and executing tasks towards optimizing the efficiency 
of task completion. Thus, our work is independent of the specific task al- 
location algorithms that the agents may employ. 

Apparently, the structure of the agent network will influence the 
task allocation process. In the next section, we will describe the com- 
posite self-organization mechanism used to adapt the structure of the 
agent network, involving an evaluation method to measure the profit 
of the network. 

4. The composite self-organization mechanism 

Before devising the self-organization mechanism, it is necessaiy to 
introduce an evaluation method to estimate the profit of the agent net- 
work. Towards this goal, we illustrate the concept of evaluation criteria, 
which includes cost, benefit, profit and reward of an agent and the agent 
network. 

4.1. Network performance evaluation 

The cost, benefit and profit of the network are calculated after a pre- 
defined number of time steps. The cost of the agent network, Cost NET , 
consists of four attributes, i.e., communication cost, computation cost 
consumed by agents to complete assigned subtasks, management cost 
for maintaining subtasks and management cost for keeping neighbor- 
hood relations with other agents. We now depict the calculation of 
each of these four attributes. 

4.1.1. Communication cost 

Communication cost relies on how many tokens being transmitted 
during task allocation process and it can be calculated by using Eq. (1). 



|A| 

Cost NET = ^ (C~'C r + C > c i> + (1) 

i=i 

where A is the set of agents in the network, c,_, c t> and c t< are the numbers 
of tokens agent a,- passes to its peers, subordinates and superiors, respec- 
tively, and, correspondingly, C_, C > and C< are the communication cost co- 
efficients for separately transmitting tokens to peers, subordinates and 
superiors. It is assumed that C^ < C_ < C<, namely that passing tokens to 
subordinates needs the lowest communication cost, while passing tokens 
to superiors requires highest communication cost Therefore, agents al- 
ways tiy to pass tokens to subordinates first, then peers, finally superiors, 
which matches the task allocation procedure described in Section 3.2. 

4.1.2. Computation cost 

Computation cost is measured as the amount of computation exe- 
cuted by the agents for completing the assigned tasks as in Eq. (2), 

W l r i| 

Load NET = Com PiF ( 2 ) 

;=i j= i 

where Comp , j is the amount of computation cost agent a, consumes 
for executing a subtask, which is represented by token A j, and Tf is 
the set of subtasks, which have been executed by agent a,. 

4.1.3. Management cost 

Management cost consists of two types of costs. 
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The first management cost is management computation cost, which 
is for maintaining subtasks. For each agent, there are two waiting lists. 
One stores the subtasks owned by the agent, say a„ which have not 
yet been finished, written as T J Vl , and another list records the subtasks 
assigned by other agents to a„ written as T™ 2 . Therefore, the manage- 
ment computation load for agent a, can be calculated by using Eq. (3), 

/ |rf' 

loadj = JVM ^ 

V J -’ 



I + £ 4 21 )’ 



( 3 ) 



where tj / ) demonstrates the number of time steps during which agent 
a, maintains one of its own subtasks in T™ 1 before the subtask is com- 
pleted, while t jy 2) indicates the number of time steps during which 
agent a, keeps one of the assigned subtasks in T™ 2 before the subtask 
is finished. M is the management computation coefficient. 

The second management cost for an individual agent is the man- 
agement relation load generated during keeping relations with 
other agents, which can be calculated by utilizing Eq. (4), 



^ e, Ei | ^ Neigh) \Neigh) 

loadf=M - Jjj t ij+ M - tJ+Mv 53 tl (4) 

j=i j= l j=l 

where Neigi, Neigf and Neigf have been defined in Definition 3, 
which are composed of afs peers, subordinates and superiors, respec- 
tively. t(j, ty and ty are the number of time steps, which demonstrate 
the time length of agent a, keeping the different relations with agent 
Qj. JVL, M y and Mg are management relation coefficients for the rela- 
tions peer, subordinate and superior, respectively. It is supposed that 
My<M_<Mg, namely that an agent maintaining the superior-subordi- 
nate relation needs the lowest management load, while maintaining 
the subordinate-superior relation is the most expensive. 

The benefit of each agent depends on how many subtasks that are 
completed by the agent. As depicted in Section 3.2, each task <t> con- 
tains several subtasks, (p lt tp 2 , .... represented as tokens A,, A 2 ..., 
and when a subtask <p,- is successfully completed, the agent, which ex- 
ecutes this subtask, can obtain the relevant benefit. 

Therefore, the benefit of the network is the sum of the benefits 
that all the agents obtain in the network as in Eq. (5), 

m 

Benefits = benefit (a f ), (5) 

i=i 

where benefit^ ) is the benefit that agent a, obtained for successfully 
completing subtasks. 

The reward of each agent, 1Z i} is based on how much load could be re- 
duced on agent a, and how much load could be decreased on the interme- 
diate agents, and how much potential benefit might be obtained by agent 
a, in the future. Here, an intermediate agent is an agent, which resides on a 
token path. For example, agent a, has no relation with agent Oj, but a,- re- 
ceived many subtasks from a,- before. a„ then, makes the decision with re- 
gard to forming a relation with a,. If a, would like to form the subordinate- 
superior relation with aj, i.e., performing the action fomi_< (for a,-, per- 
forming the reverse action foiTn_>, recall Table 1 ). The load reduction 
on a,- is —Mg-ifj, as a,- would maintain a new relationship with a,-, where 
tj- is the expected time length that a, will keep this relation with a, and 
the negative sign denotes this value represents an increase in the load. 
However, other intermediate agents between a, and a, are affected by 
a,’s decision and would obtain rewards, because they do not need to 
pass tokens between agents a, and a, any more. For one of those interme- 
diate agents, say a k , the reduced load is O c k , where c k is the number of to- 
kens that are passed by a k and received by a t . The rewards benefited by 
those intermediate agents are accounted as a,'s rewards. For a,, it could 
save management computation load, i.e., loadj, since it can directly pass 



tokens to a, without waiting for intermediate agents to pass tokens, but 
a,- also needs to maintain a new relationship with a,. The load reduction 
on a/, therefore, is M-J2' k= j^k Path\— My'tjj, where m is the number of 
tokens passed from aj to a k and \A k .path\ means the length of the path 
A k has passed. Since we assume that a token is passed between two 
agents during one time step, |A.path| actually demonstrates the time 
steps of a token passing. 

The potential benefit, which will be obtained by a,-, is evaluated on 
the basis of the benefit a, gained for completing the subtasks assigned 
by aj, i.e., 8 < J2<p.owner=a j l pbenefit, where 8g is a potential benefit coeffi- 
cient. Analogically, for aj, the potential benefit is8 > J2< P . OW ner=a i ‘pbenefit. 
We suppose that 8g > 1 , 5_ = 1 and 8y < 1 , namely that the action/orm_-<, 
with a, as the subordinate and a, as the superior, can make a, potentially 
receive more subtasks from a ,• and then get more benefits. Therefore, 
the reward of an agent can be calculated by adding load reduction 
with potential benefit. The reward of the network, 'll, is then calculated 
as the sum of rewards achieved by all the agents in the network after re- 
organization of the network as in Eq. (6). 

HI 

^ = 5>.' (6) 

t=l 

Finally, the profit of the entire network can be calculated by using 
Eq. (7). 

Prof it net = Benef it NEJ —Cost NET — 

|A| m 

Load NEr —^>( loadj + loadj J 



4.2. Self-organization mechanism design 

In this paper, the aim of our self-organization mechanism is to im- 
prove the efficiency of task allocation in the agent network via changing 
the network structure, i.e., adapting the relations among agents. As de- 
scribed in Section 2, when an agent, a„ wants to adapt relations with 
other agents, there are two problems, which have to be faced by a,. 
The first problem is determining with whom a, should adapt relations, 
and the second one is determining how to adapt relations with those se- 
lected agents. For the first problem, the selection process of each agent 
is based not only on its own former task allocation process but also on 
the integrated trust model. Through the trust model, an agent can get 
opinions from other agents about candidate selection. For example, 
when a, is considering a, as one of its candidates for adapting relations, 
a, will ask other agents, say a k and a ( , for their opinions regarding 
whether a,- is worthy to be a candidate. For the second problem, i.e., 
how to adapt relations, a multi-agent Q-learning approach is employed. 
The reason for choosing the Q-learning approach is that it provides a 
simple and suitable methodology for representing our mechanism in 
terms of actions and rewards. The self-organization mechanism is 
now illustrated in the following subsections. 

4.2.3. Candidate selection 

To assist each agent to select the most valuable candidates to adapt re- 
lations, we present a trust model based on Dezert-Smarandache theory 
(DSmT) [36]. Many researchers have made prominent efforts on trust 
and reputation models, such as probabilistic theory-based trust models 
(Teacy et al. [41 ]), certified reputation model (Huynh et al. [19]), and ev- 
idential trust models (Wang and Sun [43]). In addition, a thorough survey 
of trust and reputation systems for online service provision was also given 
byjosang et al. [20]. Briefly, trust models are usually developed for iden- 
tifying whether an agent is trustworthy or not. In this paper, to the best of 
our knowledge, we originally use a trust model for candidate selection in 
an agent network for relation adaptation and there is not an existing 
quantitative comparison between different trust models for agent 
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candidate selection. Thus, the criteria, based on which we opt for a trust 
model, are that whether the trust model is suitable for candidate selec- 
tion, whether it is easy to understand by both the readers and us and 
whether it is easy to implement. The emphasis of this paper is not devel- 
oping a new and efficient trust model. Instead, the trust model is used 
only as a complementary part of our self-organization mechanism. 
Through our investigation, Dezert-Smarandache theory is more suitable 
for our requirements than other trust models, because the theory has 
good expressiveness and low computational complexity for trust repre- 
sentation, and is easy to implement. 

We now introduce the key concepts of Dezert-Smarandache theo- 
ry [36]. Let T mean that the given agent considers a given correspon- 
dent to be trustworthy and 0 = {T, -.T) be the general framework of 
discernment. The hyper-power set of 0 is represented as H 0 = {0, 
{T},{-T},{Tn -T}, 0}. There is a general basic belief assignment which 
is a function m:H e — * [0, 1] where 



f m(0) = 0 
l HbhoHB) = 1. 



( 8 ) 



Thus, m({T}) + m({-T}) + m({0}) + m^ln-T}) = 1, as m(0) = 0. 
The trust model is defined as a tuple, lj = (mj({T}),mj({--T}), 
mj({0}),mj({Tn-'T})>, where i and j represent two different agents a,- 
and <3j, separately. Each element in Tj is described as follows, 

1. mj({T}) means the degree of a, trusting afi 

2. mj({-'T}) indicates the degree of a, distrusting a,-; 

3. mj({0}) demonstrates the degree of uncertainty about the trustwor- 
thiness dj to cij. This case happens when a, lacks evidence regarding a,-. 
If a, has no evidence at all, mj({0}) = 1; otherwise, mj({0})<l; 

4. mj({Tn iT}) depicts the degree of contradiction with regard to the 
trustworthiness Q; to a,-. This case is caused by the situation, for ex- 
ample, that a, trusts a, but other agents, which provide a, their 
opinions, distrust a,. Thereby, a, gets into contradiction. 

Thus, initially, trust between any two agents is Tj = (mj({T}) = 0, 
m j({~’^}) = O. m j({0}) = l. m /({'rn-'r}) = O}. Trust is, then, acquired 
through task allocation between agents. Suppose that a, is evaluating 
the trust of a, and a, has completed x subtasks for a,-, a, uses the notion, 
QoS (Quality of Service), to express how it is satisfied with a/s service 
of each subtask. Therefore, for x subtasks, there are x QoS values. QoS 
is in the range of [0, 1 ], and its calculation depends on the time spent 
to complete a subtask. For example, if a, can finish a subtask on time 
for a„ a/s estimation on this subtask is QoS = 1 ; otherwise, the value 
of QoS will be decreased over time. Based on QoS, a/s trust evaluation 
to Oj can be obtained through Eq. (9). 

( m)({T}) = I2 QoS>q QoS 

m|({-r}) = Eq„ s < M QoS (9) 

l = £ ra <aos< D QoS 



where w and ft, 0 < a < ft < 1 , are lower and upper thresholds, respec- 
tively. After calculation, the elements in Tj are then normalized to sat- 
isfy E BH em](B) = 1 . 

In addition, a t might ask one of its neighboring agents, say a k , for trust 
evaluation to Oj, and then combines a k 's opinion with a,'s own view to a,. 
This is trust evaluation combination which can be computed as 

Tj = T)m ( 10 ) 

where 

m'(Bi) = mj(B 2 )ffimf(B 3 ) = H (B ^ B2H e )MB2nB2=Bj) mj(B 2 )mf(B 3 ). 

Furthermore, there might be another case, a, asks a k 's opinion of a,, 
but a k may have no idea about a,. a k then inquires one of its neighbors, 
a/s, opinion to a,-. This is trust transitivity which can be calculated as 

Tj = ifsTj, ( 11 ) 



where 



m j({T}) = (mf({T})) +Pmf({Tn-iT})-mj({r}) 
mf({-T}) = (mf({ _1 ’r})) + Qmf({Tn-iT})-m]({-iT}) 



mf({TrW-}) = (mf({Tn-iT})) + 0mf({Tn^T})-m}({Tn-.T}) 

m‘m) = ^-rnj{{T})-mj{{-iT})-mj({TC)^T}), 

and 0 is a constant which is in the range (0, 1). 

We are now ready to describe the candidates selection process 
which is shown in Algorithm 1. 



Algorithm 1: candidates selection according to ai. 



1 

2 



3 

4 

5 



CandidateSi <— 0, TempCandsj <— 0; 

After predefined time steps, through Equation 9, a, 
evaluates the trust of its neighbors and those agents 
that completed a/s subtasks but not a/s neighbors; 
TempCandst <— all the agents mentioned in Line 2; 
for each aj e TempCands, do 
if m‘j({T}) > pi and 



6 

7 

8 
9 
11 
12 

13 

14 



a, g Neig/ V Neig/ V Neig/ then 
a, inquires other agent’s opinions; 
else if m‘j({-iT}) < p 2 and 
aj e Neig/ V Neig > V Neig/ then 
a, inquires other agent’s opinions; 
end if 
end for 

a 1 syncretizes opinions via Equations 10 and 11; 
for each a, e T empCands, do 
if mj{{T}) > pi and 



15 

16 

17 

18 
19 



aj i Neig / V Neig/ V Neig/ then 
CandidateSi <— a 2 ; 

else if T}) < p 2 and 

aj £ Neig/ V Neig/ V Neig/ then 
CandidateSi <— op 

end if 
end for 



Firstly, in Line 2, after a period, a, should have some subtasks com- 
pleted by other agents. a„ then, evaluates the trust of those agents, 
which completed a/s subtasks by using Eq. (9). Meanwhile, a,- also 
evaluates the trust of its neighboring agents, as a, might want to 
weaken relations with those neighbors that cannot complete a/s sub- 
tasks on time. In Lines 4-11, a, inquires other agents' opinions about 
the potential candidates in TempCandsj. Here, p 3 and p 2 are two 
thresholds ranged in (0, 1 ). The inquiry process begins with a, initially 
contacting a neighbor. If this neighbor cannot provide an answer to a„ 
it then gives a, a referral which designates another agent. The process 
terminates in success when a trust value tuple, i.e., T /, is received and 
in failure when the depthLimit is reached or when the inquiiy arrives 
at an agent, which neither gives an answer nor a referral. Here, depth- 
Limit restricts the length of referral chains. Kautz et al. [25] have 
proved that shorter referral chains are more likely to be fruitful and 
accurate, so we set depthLimit as 2 in this paper. Line 5 indicates 
that aj is not a neighbor of a, but a, could complete a/s subtasks on 
time, while Line 6 demonstrates that a,- is a neighbor of a t but a 2 has 
a bad task completion record. After inquiry, a f synthesizes other 
agents' opinions and selects candidates again (Lines 12-19). After 
candidate selection, a, will start the relation adaptation with those 
candidates, which is introduced in the next subsection. 
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4.2.2. Relation adaptation 

Before describing our relation adaptation algorithm, we first con- 
sider a simple scenario with two agents, a, and a,, and three available 
actions for each agent. The reward matrix of the two agents is dis- 
played in Table 2. 

Each cell (rj‘- y ,if- y ) in Table 2 represents the reward received by the 
row agent (a,) and the column agent (a,-), respectively, if the row agent 
a, plays action x and the column agent Oj plays action y. Algorithm 2 
demonstrates our relation adaptation approach in pseudocode form. 



Algorithm 2: relation adaptation according to ai. 
1 for each a, e Candidates do 



10 

11 

12 

13 

14 

15 

16 
17 



Actj • 
Actj 



available-actions(aj, a j); 
- av ailable -action s(c>j, ai); 



for each x e Actj, y e Actj do 

Initialise Q ix and (Q jy arbitrarily; 
for k = 0 to a predefined integer do; 
calculate n ix (k) and n jy {k ); 

Q ix (k+ 1) = Q,Ak)+ 

Kix(k)a(Y,y r 7 n jyW - QixW); 

Qjy(k + 1 ) = Qjy{k) + 

7tjy(k)a(Z x r X j’ y it ix (k) - Q ix (k)); 

end for 
end for 

i-Yopli > y„p,i) * ”t Qjy)-, 

ai, aj take actions x opli andy 0/ ,u, respectively; 
Mij «- Mij + (N}/Ps - i); 
if fijj > 1 then ptj <— 1; 
if pij < 0 then p.^ «— 0; 

Pji «- Pif, 



18 end for 



adjustment depends on how many subtasks a,- completed for a, in the 
last time steps, where the more subtasks completed by a, the higher rela- 
tion strength value achieved. In Line 14, N] means the number of a,'s sub- 
tasks completed by a,-, and p 3 is a threshold which is an integer. 

Having illustrated the fundamental part of our self-organization 
mechanism, we now discuss how this mechanism deals with an open 
agent network. When a new agent aj joins the network, it has no historical 
interactions with the existing agents in the network. Thereby, a, cannot 
use Algorithm 1 to select candidates to establish relations. Generally, a, 
prefers to connect to an existing agent, a„ which has more uncompleted 
subtasks compared with those agents which have less uncompleted sub- 
tasks, in order to achieve benefits by completing those subtasks. At the 
same time, a, does not like to connect to a„ which already maintains 
many neighbors, because the more neighbors has, the less opportunity 
for aj to get subtasks from a,. In addition, too many neighbors will bring a 
heavy workload on a, for keeping relations with those neighbors. Thus, 
the candidate selection of a, is based on the ratio between each existing 
agent's amount of uncompleted subtasks and the number of neighbors 
(usually known as degree), recorded as TDR (Task-Degree Ratio). If the 
TDR of a, is larger than a predefined threshold, p 4 , a, builds a subordi- 
nate-superior relation with a„ as a, being the subordinate and Py = /j,, = 0.5. 
In contrast, when an existing agent leaves, the other agents can easily re- 
organize by using our self-organization mechanism. 

5. Experiment and analysis 

In this section, the effectiveness of our self-organization mechanism is 
demonstrated through experimental evaluation. We first describe the ex- 
perimental setup and thereafter present the experimental results and 
analysis. 

5.1. Experimental setup 



After selecting candidates, a, and aj, firstly, estimate which actions 
are available at the current state (Lines 2 and 3) as described in 
Section 3.2. Then, a,- and Oj learn the Q-value of each available action, 
separately (Lines 4-1 1 ). In Line 5, the Q-value of each action is initial- 
ized arbitrarily, while, in Line 7, n ix indicates the probability regarding 
agent a, taking the action x. To calculate n ix , we employ the -greedy 
exploration method devised by Gomes and Kowalczyk [11] shown 
in Eq. (12), where 0< <1 is a small positive number and n is the num- 
ber of available actions of ap The reason for choosing the Q-greedy ex- 
ploration method is that it has been justified through many 
applications, for example Galstyan et al. [8] applied the method to de- 
velop a decentralized resource allocation mechanism. 

= { (1— e ) + (e/n), if Q j)( isthehighestc/n, otherwise (12) 

Furthermore, in Line 8, Q x is the Q-value of action x taken by agent 
dj. In Lines 8 and 9, 0<a<l is the learning rate. 

When finishing learning Q-values, a, and a, (Line 12) cooperate to find 
the optimal actions for both of them, where match(x, y ) is a function 
which is used to test whether the actions x and y taken by a,- and a,-, re- 
spectively, are matched. An action is only matched by its reverse action 
described in Table 1. Therefore, a, and a, have to cooperate to find the ac- 
tions, which can be matched together and maximize the sum of their Q- 
values. Finally, a,- and a,- adjust their relation strength (Lines 14-17). The 



Table 2 

Reward matrix of a, and cij. 



aflj 
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enh_~ 


enh _> 


enh _> 


r i,i -i,i 

r i ’ r j 


—1,2 -1,2 
•i . Ij 


-1,3 -1,3 
<i . Ij 


enh 


if ' 1 ,!? 1 


rf 2 , if ' 2 


if' 3 , if ' 3 


enh _< 


if- 1 , if 1 


if- 2 , if - 2 


if' 3 , if ' 3 



To objectively exhibit the effectiveness of our self-organization mech- 
anism, named Leam-Adapt, we compare our mechanism with three other 
mechanisms, i.e„ Central, K-Adapt [27] and Leam-Adapt without Trust. As 
the aim of our work is to extend Kota et al.'s [27] work, i.e., K-Adapt, we 
intuitively choose K-Adapt as a standard for comparison. On the other 
hand, K-Adapt is the first self-organization mechanism which considered 
multiple relations in an agent network and other related works consid- 
ered only one type of relations. Thus, Kota et al.'s work represents the 
most efficient work in our research domain. If our mechanism could 
(hopefully) outperform Kota et al.'s work to some extent, our contribution 
could be demonstrated. It is difficult to use other existing approaches for 
comparison because they were designed in a different environment (ex- 
istence of only one type of relations) from ours. 

1 . Central: this is an ideal centralized task allocation mechanism in which 
there is an external omniscient central manager that maintains infor- 
mation about all the agents and tasks in the network. The central man- 
ager is able to interact with all the agents in the network without cost. 
This method is neither practical nor robust, but it can be used as an 
upper bound of the performance of an organization in our experiment. 
By comparing with Central mechanism, we can have an intuitional 
view of how the decentralized mechanisms can do and the perfor- 
mance gap between the decentralized mechanisms and the best 
mechanism, i.e., Central mechanism. Obviously, it is hoped that the 
performance of a good decentralized mechanism can meet or ap- 
proach to the performance of Central mechanism. 

2. K-Adapt: this mechanism was proposed by Kota et al. [27], which uti- 
lized a meta-reasoning approach to adapt the relation between two 
agents. Their approach always chooses the actions that can bring the 
most utilities. The aim of this paper is to improve and extend K- 
Adapt in some aspects, namely introducing a trust model, extending 
crisp relations to weighted relations, and developing a (hopefully) 
more efficient relation adaptation algorithm. Thus, we selected I(- 
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Adapt as a standard to evaluate our mechanism. In that case, we can 
see whether our mechanism can really improve somewhat compared 
with K-Adapt. 

3. Learn-Adapt without Trust, this mechanism is the same as Learn- 
Adapt but without the trust model. Thus, the candidate’s selection 
is based only on an agent's own interaction history with other 
agents, i.e., that a, wants to adapt relations with those agents 
who complete many or few subtasks of a,. The purpose of including 
this mechanism is to reveal the significance of the trust model. 

Since the Central and K-Adapt mechanisms do not cover weighted 
relations, in these two mechanisms, we simply set p,-, = p,, = 1 , if a, 
and 0 / have a relation; otherwise )kj =Pji = 0. 

In this experiment, the agent organized network is generated by using 
the Small World network [44], in which most neighbors of an agent are 
connected to each other. Nonetheless, the approach presented by Watts 
and Strogatz [44] deals with only one relation between agents in the 
Small World network. We, thus, modify the approach to accommodate 
multiple relations by randomly changing the relation between two neigh- 
boring agents. Moreover, in order to control the number of resources an 
agent can have, a parameter called Resource Probability ( RP ) is employed, 
such that an agent is assigned a resource with probability RP. Hence, with 
the increase of RP, agents could possess more resources. For simplicity, 
tasks are created by randomly generating required resource types and 
the amount of resource of each resource type. In addition, the task arrival 
and task required time are distributed according to Poisson and exponen- 
tial distributions, separately, and each task is randomly distributed to one 
of the agents in the system. Finally, the evaluated criteria consist of Profit- 
NEJ (Eq. (7)), obtained by each approach in a percentage format with the 
maximum network profit gained by the central approach, and time con- 
sumed by these mechanisms. The experimental setting of our work is 
similar as that of Kota et al.'s work, since the essence of our work is the 
same as Kota et al.'s work, i.e., reasonably adapting multiple relations in 
an agent network. Our experimental setting, however, includes some 
new parameters which are exclusive in our work, e.g., learning rate, action 
selection distribution probability, etc., as several novel contents, i.e., the 
trust model, weighted relations and a new relation adaptation algorithm, 
are integrated in our mechanism. For clarity, the values of parameters that 
are used in this experiment and their meanings are listed in Table 3. These 
values were chosen experimentally to provide the best results. 



5.2. Experimental results: closed networks 

Fig. 3(a) and (b) demonstrate the percentage profits obtained by K- 
Adapt, Leam-Adapt without Trust and Learn-Adapt with different re- 
source probabilities (RP), compared with the maximum profit, which 
is gained by Central. The x-axis represents the number of simulation 
runs and each run consists of 50,000 time steps. It can be seen that 
Learn-Adapt performs consistently better than both K-Adapt and 
Leam-Adapt without Trust in all situations. In Fig. 3(a) (RP= 0.1 ), with 
more simulation runs, the difference between Leam-Adapt and I<- 
Adapt is gradually increasing. This is because when each agent has 
very few resources, agents have to allocate tasks to others. Thus, an 



Table 3 

Parameters setting. 



Parameters 


Values 


Explanations 


n 


200 


The number of agents 


deg 


6 


The average number of neighbors 


RP 


0.1 and 0.6 


Resource Probability 


m 


50,000 


The number of time steps 


a 


0.2 


Learning rate 




0.4 


Action selection distribution probability 


k 


100 


Learning rounds 


Pi. p2> p4 


0.7, 0.5, 1 


Thresholds for choosing agents to adapt 


Pi 


5 


Threshold for adapting relation strength 
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Fig. 3. Performance of three mechanisms in a closed agent network. 



effective network structure could reduce agents' communication cost 
and management cost, and could further raise the profit of the entire 
network. In this case, a smarter mechanism could bring better perfor- 
mance through generating a more effective network structure. With 
the increase of resource probability (Fig. 3(b)), K-Adapt, Leam-Adapt 
without Trust and Leam-Adapt could achieve better performance. This 
can be explained by the fact that with higher resource probability, 
each agent would have more resources and, thus, could fulfill more 
tasks by itself. It should also be noted that the difference between 
Learn-Adapt and K-Adapt narrows as the resource probability rises. 
This is due to the fact that when agents become more homogeneous, a 
smarter method cannot correspondingly bring more profit for the net- 
work. Furthermore, Fig. 3(c) indicates the time consumption of I<- 
Adapt, Leam-Adapt without Trust and Leam-Adapt. It can be seen that 
the time consumption of Leam-Adapt costs as little as K-Adapt, while 
Learn-Adapt without Trust consumes more time. This can be ascribed 
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Fig. 4. Performance of three mechanisms in an open agent network. 



to the fact that Leam-Adapt without Trust requires a time-consuming 
learning round in order to choose an optimal action to adapt agent rela- 
tions. However, although Leam-Adapt also needs the time-consuming 
learning round, after integrating the trust model, Learn-Adapt curtails 
the time consumption markedly, as the trust model can help agents 
choose the most valuable candidates so as to reduce the volume of can- 
didates set as shown in Algorithm 1. 

5.3. Experimental results: open network 

The experimental setup for an open agent network is analogous to 
the closed one. After each simulation run, 10 new agents join into the 
network and the resource assignments of these new agents are also 
based on Resource Probability, as described in Section 5.1. Meanwhile, 
10 existing agents leave the network. If these leaving agents have 
uncompleted subtasks, they will transfer those uncompleted subtasks 
to their neighbors uniformly. 



In the open agent network, we again find that Leam-Adapt performs 
better than both K-Adapt and Leam-Adapt without Trust (Fig. 4(a) and 
(b)), which show a similar trend as in the closed agent network. How- 
ever, in Fig. 4(c), the time consumption of the three mechanisms con- 
verges relatively slowly. This is due to the fact that the agent network 
is open with new agents coming and existing agents leaving and 
hence the network is always in a dynamic state and is difficult to stabi- 
lize. In addition, time consumption in the open network is more than 
that in the closed network. This is because that, in the open network, 
at every round, new agents join into the network and these new agents 
have to start to build some relations with existing agents, so this process 
costs some extra time. Similarly, existing agents leaving may result in 
their neighbors reforming relations with other agents, which spends 
extra time as well. Contrarily, in the closed network, no agents join or 
leave, so the relevant time consumption can be saved. 

Finally, we are also interested in whether the introduction of 
weighted relations could bring some performance improvement com- 
pared with the crisp relations in the open agent network. We, therefore, 
compare the Leam-Adapt mechanism with Crisp Leam-Adapt, which 
considers only crisp relations by setting = q,, = 1 if there is a relation 

between a f and a,- otherwise ^=^ = 0. The experimental result is 
shown in Fig. 5 with Resource Probability, RP= 0.6. It can be seen that, 
by introducing weighted relations in the network, the performance of 
Learn-Adapt is improved. This can be explained by the fact that, for 
Learn-Adapt, agents could still keep some neighbors, which cannot com- 
plete enough subtasks in one simulation run, through weakening the re- 
lation strength, but these neighbors might do better work in the next 
simulation run since future tasks may require the resources, which are 
possessed by those incompetent neighbors. For Crisp Leam-Adapt, how- 
ever, agents may directly cut those incompetent neighbors and lose fu- 
ture task allocation opportunities. 

In summary, the performance of our Leam-Adapt is around 85%-95% 
of the upper bound centralized allocation method, and on average 10% 
better than K-Adapt, which demonstrates that our mechanism is better 
than the K-Adapt mechanism in some aspects. In addition, we also com- 
pared Leam-Adapt with its two simplified versions, i.e., Leam-Adapt 
without Tmst and Crisp Leam-Adapt in order to demonstrate the impor- 
tance of the two introduced concepts, namely trust model and weighted 
relation. Furthermore, we find that, by integrating the trust model, the 
time consumption is lowered significantly, which again reveals that 
the trust model is crucial for this good performance. 



6. A potential application scenario 



In this section, we present a potential application scenario of our self- 
organization mechanism, i.e., packet routing in wireless sensor networks. 

Wireless sensor networks are those networks, which are connected 
by embedded sensors, actuators and processors in a wireless and ad 
hoc manner [38]. This combination of wireless and data networking 
will result in a new form of computational paradigm, which is more 
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communication-centric than any other computer network. Wireless sen- 
sor networks are part of a growing collection of information technology 
constructs, which are moving away from the traditional desktop wired 
network architecture towards a more ubiquitous and universal mode 
of information connectivity. Wireless sensor networks can be used for 
combat field surveillance, intrusion detection, widespread environmen- 
tal sampling and health monitoring. In this paper, a wireless sensor net- 
work refers to a group of sensors, or nodes, linked by a wireless medium, 
which could be achieved by using infrared devices or radio. 

One of the most important issues of wireless sensor network design 
is packet routing protocols, which aims to deliver data to destinations 
effectively and efficiently. Currently, many researchers have made ven- 
erable efforts on this issue. Akkaya and Younis [2] surveyed and classi- 
fied recent wireless sensor networks routing protocols. They classified 
the routing protocols into three categories as follows. 

1. Data centric protocols, where nodes send queries to certain regions 
and wait for data from the sensors located in the selected regions. 
SPIN [14] is the first data centric protocol, which considers data nego- 
tiation between nodes in order to eliminate redundant data and save 
energy. 

2. Hierarchical protocols. Since a single-tier network can cause the 
gateway to overload with an increase in sensor density, network 
clustering has been pursued in some routing approaches. The main 
aim of hierarchical routing is to efficiently maintain the energy con- 
sumption of sensor nodes by involving them in multi-hop communi- 
cation within a particular cluster and by performing data aggregation 
and fusion in order to decrease the number of transmitted messages. 
LEACH [13] is one of the first hierarchical routing approaches for sen- 
sor networks. The idea proposed in LEACH is to form clusters of the 
sensor nodes based on the received signal strength and use local 
cluster heads as routers to transmit messages. 

3. Location-based protocols. In most cases, location information is neces- 
sary in order to calculate the distance between two particular nodes so 
that the distance between two particular nodes can be estimated. Geo- 
graphic Adaptive Fidelity (GAF) [45] is an energy-aware location- 
based routing protocol devised primarily for mobile ad hoc network, 
but is also applicable to sensor networks. It forms a virtual grid for 
the covered area and each node uses its GPS-indicated location to asso- 
ciate itself with a point in the virtual grid. Then, the distance between 
two nodes can be calculated according to their GPS coordinates. 

Each of these routing protocols has both advantages and disadvan- 
tages. It is nearly impossible to develop a perfect routing protocol, 
which could overcome all the disadvantages of other protocols, because 
some attributes of a routing protocol conflict with other attributes, e.g., 
energy saving versus delivering successful ratio. Thus, researchers usually 
focus their design primarily on some specific attributes, e.g., network 
throughput, delivering successful ratio, communication overhead, packet 
delay, etc. With this motivation, we intend to employ our self- 
organization mechanism to enhance current routing protocols, instead 
of proposing a novel and specific protocol. The concept of self- 
organization has been introduced into wireless sensor networks for sev- 
eral years. For instance, Kamik and Kumar [23] introduced self- 
organization into wireless sensor networks for optimizing network 
throughput via building an optimal topology and tuning network access 
parameters, such as the transmission attempt rate. Nonetheless, topology 
of wireless sensor networks sometimes cannot be altered arbitrarily, such 
as environmental sampling sensor networks where the interested areas 
are usually fixed and thereby, the topology cannot be adapted arbitrarily. 

Within this constraint, we propose a two-layer architecture to im- 
prove the routing performance, where the first layer is the wireless sen- 
sor network while the second layer is a cooperation network as shown 
in Fig. 6. In the first layer, sensors, represented as agents, are connected 
by some wireless medium as described earlier, so this layer is a physical 
network. In the second layer, agents are linked by weighted cooperation 
relations as described in Section 1, so this layer is an abstract network. 



The cooperation network is formed through the agents past coopera- 
tion. For example, in Fig. 6, if many packets sent from agent 1 have to be 
forwarded by agent 7, agent 1 will add agent 7 as one of its cooperation 
neighbors and the relation could be assigned as the subordinate-superior 
relation. Then, in the future, if agent 1 has packets to be sent, it sends 
the packets directly to agent 7 and agent 7 then, forwards the packets 
to the destinations for agent 1. For the process that agent 1 sends packets 
to agent 7 and agent 7 forwards packets to the destinations, agents 1 and 
7 can use any existing routing protocols. Thus, it can be seen that the co- 
operation network layer is used only to guide the packet routing process 
in the wireless sensor network layer in order to improve the routing effi- 
ciency, rather than becoming a concrete routing protocol which directly 
operates in the wireless sensor network layer. Thus, in this case, any of 
the current routing protocols can be employed in the wireless sensor net- 
work layer. Here, our self-organization mechanism works on the cooper- 
ation network layer, which enables each agent to keep the most useful 
cooperation neighbors and which, in turn, assists to achieve better routing 
performance. In this scenario, each packet can be modeled as a token as 
described in Section 2, and the agents, which can forward many tokens, 
are considered to be useful. We consider that this idea may be promising, 
as it could probably enhance many current routing protocols. Currently, 
we are engaged in this work by building theoretical model and we will 
test it within a network simulator once the theoretical work is done. 

7. Conclusion 

This paper introduced a composite self-organization mechanism 
which aims to adapt relations among agents in a network to achieve ef- 
ficient task allocation. This mechanism integrates a trust model, a multi- 
agent Q-learning algorithm and weighted relation concept, which to- 
gether resulted in good performance compared with another famous 
self-organization mechanism, K-Adapt. Since this mechanism is decen- 
tralized and continuous over time, it meets the principles of self- 
organization defined by Serugendo et al. [34]. In addition, this paper 
gave a potential application scenario of the proposed self-organization 
mechanism, which demonstrates that our self-organization mechanism 
could be applied in some real world cases. 

Since the trust model and learning algorithm are complementary 
parts of our mechanism, a more effective trust model and a more efficient 
learning algorithm could improve the performance of our mechanism. 
Thus, devising impactful trust models and learning algorithms is a stream 
of future work to refine our mechanism. Specifically, we will quantitative- 
ly compare different trust models for candidate selection to find which 
one is the best for our problem, and we will develop a new trust model 
by ourselves based on the analysis of existing trust models. Furthermore, 
we intend to enhance the self-organization mechanism by enabling it to 
handle dynamic cases, where new types of resources are introduced 
into the agent network and some existing resources are phased out. It is 
also interesting to introduce the asymmetric relation strength into our 
agent network model, e.g., an agent, say A, quite likes another agent, say 
B, with the degree 0.7 but B does not very like A with the degree only 




D. Ye et al. / Decision Support Systems 53 (2012) 406-417 



417 



0.3. We believe that introducing such asymmetric relation strength will 
make our self-organization mechanism more flexible and more suitable 
to real-world problems. Finally, as stated in the previous section, we 
will apply our self-organization mechanism to enhance packet routing 
in wireless sensor networks. 
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